Maximum Entropy Learning Model for Biomedical Mention Classification
نویسندگان
چکیده
I, Ekaterina Shutova, of Hughes Hall, being a candidate for the M.Phil in Computer Speech, Text and Internet Technology, hereby declare that this dissertation and the work described in it are my own work, unaided except as may be specified below, and that the dissertation does not contain material that has already been used to any substantial extent for a comparable purpose. Abstract " Genes can be named on the basis of a mutant phenotype or on the basis of the predicted protein product or RNA product " (Genetic Nomenclature, J. Hodgkin). This results in the fact that in the biomedical literature same gene names tend to be used to refer to more than one entity. It is very common for a particular gene name to refer to the protein produced, or some other product. Therefore, when performing information extraction tasks identifying gene names is not sufficient and it is necessary to distinguish between all biological entities for which the same gene name is used, which is the key objective of this project. The proposed approach consists of training a Maximum Entropy classifier for the task in question, using a large automatically created biomedical corpus, as opposed to other systems being trained on expensive manually annotated data. First, a rule-based recognizer is developed using information from ontology that classifies biological mentions into seven biotypes. Subsequently this baseline system is applied to a large corpus of biomedical text in order to tag it. The resulting corpus constitutes the training data for the machine learning system. As expected, the bootstrapped maximum entropy classifier trained on automatically created data performs better than the baseline recognizer itself, which proves the efficiency of the adopted approach. Subsequently, machine learning techniques investigated and developed within the framework of the project can be applied to text mining in any domain.
منابع مشابه
Maximum entropy modeling for mining patient medication status from free text
Using a classification scheme of patient medication status we sought to recognize and categorize medications mentioned in the unrestricted text of clinical documents generated in clinical practice. The categories refer to the patient's status with respect to the medication such as discontinuation, start or initiation, and continuation of a given medication. This categorization is performed with...
متن کاملClassification of Right/Left Hand Motor Imagery by Effective Connectivity Based on Transfer Entropy in EEG Signal
The right and left hand Motor Imagery (MI) analysis based on the electroencephalogram (EEG) signal can directly link the central nervous system to a computer or a device. This study aims to identify a set of robust and nonlinear effective brain connectivity features quantified by transfer entropy (TE) to characterize the relationship between brain regions from EEG signals and create a hierarchi...
متن کاملMaximum Entropy Learning Model for Biomedical Semantic Type Induction
In the biomedical literature the gene names tend to be used to refer to biomedical entities other than genes. Therefore, when performing information extraction tasks it is necessary to distinguish between all such biomedical entities. Our approach consists of training a Maximum Entropy classifier for this task using a large automatically-created training corpus, as opposed to manually annotated...
متن کاملClothing Product Reviews Mining Based on Machine Learning
This paper used the method of machine learning to study clothing product reviews classification based on big enterprise data. Taking Taobao clothing reviews as the object, it firstly excavated review themes from reviews corpus by association rules, and then searched review themes related to the categories by a method of mutual information to enrich the review themes. In the process of building ...
متن کاملBiomedical Named Entity Recognition System
We propose a machine learning approach, using a Maximum Entropy (ME) model to construct a Named Entity Recognition (NER) classifier to retrieve biomedical names from texts. In experiments, we utilize a blend of various linguistic features incorporated into the ME model to assign class labels and location within an entity sequence, and a postprocessing strategy for corrections to sequences of ta...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007